Interrater reliability: the kappa statistic
نویسنده
چکیده
The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.
منابع مشابه
Agreement, the f-measure, and reliability in information retrieval.
Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the kappa statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can...
متن کاملReliability of the visual assessment of cervical and lumbar lordosis: how good are we?
STUDY DESIGN Blinded test-retest design. OBJECTIVE To measure the intrarater and interrater reliability of the visual assessment of cervical and lumbar lordosis. SUMMARY OF BACKGROUND DATA Cervical and lumbar lordoses are frequently evaluated using visual assessment, but little attempt has previously been made to measure the reliability of visual assessment. METHODS Twenty-eight chiroprac...
متن کاملReliability of the NICMAN Scale: An Instrument to Assess the Quality of Acupuncture Administered in Clinical Trials
BACKGROUND The aim of this study was to examine the reliability of a scale to assess the methodological quality of acupuncture administered in clinical research. METHODS We invited 36 acupuncture researchers and postgraduate students to participate in the study. Firstly, participants rated two articles using the scale. Following this initial stage, modifications were made to scale items and t...
متن کاملComments on the article "can the ICF be used as a rehabilitation outcome measure? A study looking at the inter- and intra-rater reliability of ICF categories derived from an ADL assessment tool".
PURPOSE The categories of the International Classification of Functioning , Disability and Health (ICF) could potentially be used as components of outcome measures. Literature demonstrating the psychometric properties of ICF categories is limited. OBJECTIVE Determine the agreement and reliability of ICF activities of daily living category scores and compare these to agreement and reliability ...
متن کاملSystemic lupus erythematosus disease activity index 2000 responder index-50 website.
OBJECTIVE To test the interrater and intrarater reliability of the Systemic Lupus Erythematosus Disease Activity Index 2000 (SLEDAI-2K) Responder Index (SRI-50), an index designed to measure ≥ 50% improvement in disease activity between visits in patients with systemic lupus erythematosus. METHODS This was a multicenter, cross-sectional study with raters from Canada, the United Kingdom, and A...
متن کامل